Mutual attention diversity feature fusion network-relevant fine-grained classification
نویسندگان
چکیده
目的 基于Transformer架构的网络在图像分类中表现出优异的性能。然而,注意力机制往往只关注图像中的显著性特征,而忽略了其他区域的次级显著信息,基于自注意力机制的Transformer也是如此。为了获取更多的有效信息,从有区别的潜在性特征中学习到更多的可判别特征,提出了一种互补注意多样性特征融合网络(complementary attention diversity feature fusion network,CADF),通过关注次显特征和对通道与空间特征协同编码,以增强特征多样性的注意感知。方法 CADF由潜在性特征模块(potential module,PFM)和多样性特征融合模块(diversity module,DFFM)组成。PFM模块通过聚合空间与通道中感兴趣区域得到显著性特征,再对特征的显著性进行抑制,以强制网络挖掘潜在性特征,从而增强网络对微小判别特征的感知。DFFM模块探索特征间的相关性,对不同尺寸的特征交互建模,以得到更加丰富的互补信息,从而产生更强的细粒度特征。结果 本文方法可以端到端地进行训练,不需要边界框和多阶段训练。在CUB-200-2011(Caltech-UCSD Birds-200-2011)、Stanford Dogs、Stanford Cars以及FGVC-Aircraft (fine-grained visual classification of aircraft)4个基准数据集上验证所提方法,准确率分别达到了92.6%、94.5%、95.3%和93.5%。实验结果表明,本文方法的性能优于当前主流方法,并在多个数据集中表现出良好的性能。在消融研究中,验证了模型中各个模块的有效性。结论 本文方法具有显著性能,通过注意互补有效提升了特征的多样性,以此尽可能地获取丰富的判别特征,使分类的结果更加精准。;Objective Fine-grained requirement is focused on images segmentation for such domain like multiple wild birds or vehicles-between features extraction in related to transferring benched category into more detailed subcategories. Due the subtle inter-category differences and large intra-category existed,it challenging capture specific regions-targeted classification. The mechanism are still used pay salient picture only although Transformer architecture-based network has its potentials image classification,and most latent ignored self-attention mechanism-based required be involved as well. To get effective information,discriminative features-derived representations learnt fine-grained feature,we develop a complementary fusion(CADF)network, which can extract multi-scale models from channel spatial interactions images. Method A mutual facilitated it consists two modules:1)potential module (PFM):it different parts,and enhanced with preservation features. 2)Diversity module(DFFM):multiple features-between information interaction modeling enhance rich feature,and parts terms module. scalable realize mutual-benefited information,and robustness discriminative. Our proposed configured PyTorch an NVIDIA 2080Ti GPU. weight parameters model initialized using ImageNet dataset-related SwinTransformer pre-trained. optimization performed AdamW optimizer momentum 0. 9 cosine annealing scheduler. batch size set 6,the learning rate backbone layer 000 1,the newly added 01,and decay 05 For training,the input resized 550×550 pixels cropped 448×448 randomly,and random horizontal flips data augmentation further. testing,the center. hyper-parameters λ = 1,β 5 Result verify effectiveness,the experiments carried out four datasets:CUB-Birds,Stanford Dogs,Stanford Cars,and FGVC-Aircraft. Each accuracy reached following percentages:92. 6%,94. 5%,95. 3% 93. 5%. ablation experiments,the effectiveness PFM DFFM verified Compared framework,it improve greatly via adding only. Swin-B + by 1. 4%,1. 4% 8% datasets CUB-Birds,Stanford Dogs Stanford Cars datasets. module-added only,the each exchange (Swin-B DFFM) also improved 4%,0. 5% It shows that CADF strong ability certain extent,and structure dataset potentially. visualization conducted regional intuitively. In study,the this Conclusion resolve problem insufficient mechanismbased extraction,we method
منابع مشابه
Detecting Visually Relevant Sentences for Fine-Grained Classification
Detecting discriminative semantic attributes from text which correlate with image features is one of the main challenges of zero-shot learning for fine-grained image classification. Particularly, using full-length encyclopedic articles as textual descriptions has had limited success, one reason being that such documents contain many non-visual or unrelated sentences. We propose a method to auto...
متن کاملIncremental Fine-grained Information Status Classification Using Attention-based LSTMs
Information status plays an important role in discourse processing. According to the hearer’s common sense knowledge and his comprehension of the preceding text, a discourse entity could be old, mediated or new. In this paper, we propose an attention-based LSTM model to address the problem of fine-grained information status classification in an incremental manner. Our approach resembles how hum...
متن کاملContent Specific Feature Learning for Fine-Grained Plant Classification
We present the plant classification system submitted by the QUT RV team to the LifeCLEF 2015 plant task. Our system learns a content specific feature for various plant parts such as branch, leaf, fruit, flower and stem. These features are learned using a deep convolutional neural network. Experiments on the LifeCLEF 2015 plant dataset show that the proposed method achieves good performance with...
متن کاملAttention for Fine-Grained Categorization
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, beginning with fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the...
متن کاملIs Fine Grained Classification Different?
We performed experiments on two fine-grained classification tasks using a state-of-the-art pipeline (descriptor + dictionary + LLC encoding + max pooling + linear SVM). We found that this standard pipeline out-performed a dictionary-free classification technique (stacked evidence trees) that was specifically designed for fine-grained classification. The success of the method depends on two fact...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Image and Graphics
سال: 2023
ISSN: ['1006-8961']
DOI: https://doi.org/10.11834/jig.220295